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Abstract 

A general principle called "conservation of the ellipsoid of concentration" is introduced 
and a generalized entropic form of order a is optimized under this principle. It is shown 
that this can produce a density which can act as a pathway to multivariate Gaussian 
density. The resulting entropic pathway contains as special cases the Boltzmann-Gibbs 
(Shannon) and Tsallis (Havrda-Charvat) entropic forms. 
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1 Introduction 

The normal (Gaussian) distribution is a family of continuous probability distributions 
and is ubiquitous in the field of statistics and probability (Feller [4]). The importance of 
the normal distribution as a model of quantitative phenomena is due to the central limit 
theorem. The normal distribution maximizes Shannon entropy among all distributions 
with known mean and variance and in information theory, Shannon entropy is the measure 
of uncertainty associated with a random variable. 

In statistical mechanics, Gaussian (Maxwell-Boltzmann) distribution maximizes the 
Boltzmann-Gibbs entropy under appropriate constraints (Gell-Mann and Tsallis [7]). 
Given a probability distribution P = {pi} {i = 1,...,N), with pi representing the 
probability of the system to be in the ith microstate, the Boltzmann-Gibbs entropy is 
S{P) = —kY^f^^pilnpi, where k is the Boltzmann constant and N the total number 
of microstates. If all states are equally probable it leads to the Boltzmann principle 
S = k InW [N = W). Boltzmann-Gibbs entropy is equivalent to Shannon's entropy if 
k = l. 

A generalization of Boltzmann-Gibbs extensive statistical mechanics is known as Tsal- 
lis non-extensive statistical mechanics (Swinney and Tsallis [5], Abe and Okamoto [6]). 
Tsallis discovered the generalization of Shannon's entropy to non-extensivity as S{P, q) = 
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Cl2i=iPi ~ !)/(! ~ 5')- For g — > 1, Shannon's entropy is recovered. Tsallis introduced 
g-probabilities accommodating the fact that non-extensive systems are better described 
by power law distributions, , now called g-probabilities. The 'p\ are scaled probabilities 
where g is a real parameter. 

This paper, in Section 2, introduces a general principle called conservation of the 
ellipsoid of concentration and maximizes a generalized entropic form of order a, containing 
Shannon (Boltzmann-Gibbs), Renyi, Havrda-Charvat (Tsallis) entropies as special cases, 
under this principle, in Section 3. Normalizing constants are derived in Section 3.1 and 
mean value and covariance matrix in Section 3.2 for the cases a < 1, a > 1, and a = 1. 
The pathway, characterized by a is shown to produce multivariate type-1 beta, Gaussian, 
and type-2 beta densities, respectively. In Section 3.3 a graphical representation of the 
pathway surface is shown. Section 4 draws conclusions. 



2 Conservation of the ellipsoid of concentration 

Consider a g x 1 vector X, X' = (xi, . . . , Xq), where a prime denotes the transpose. The 
components xi,...,Xq may be real scalar mathematical variables or random variables 
describing various components in a physical system. Each component in X can be assumed 
to have a finite mean value and variance. If E denotes the expected value, the value on 
the average in the long-run, then we can assume E{xi) = /ij < oo for z = 1, . . . , g. Let 
ji' = (yUi, . . . , fig). Similarly one can assume the expected dispersion in each component 
to be finite. The square of a measure of dispersion is given by the variance or Var(xj). 
That is, Var(a;j) < oo. The components may be correlated or may have pair-wise joint 
variations. A measure of pair-wise joint variation is covariance between Xi and Xj or 
Cov{xi,Xj) = E[xi — E{xi)][xj — E{xj)] = Vij so that when i = j we have Var(xi) = vu. 
The matrix of such variances and covariances is the covariance matrix in X, denoted by 
Cov(X) = E{X - E{X)){X - E{X)y = V = (vij). Note that V is real symmetric when 
real, and V is at least non-negative definite. Let us assume that no 
component in the g x 1 vector X is a linear function of other components so that we 
can take V to be nonsingular. This will then imply that V is positive definite. That is, 

V = V > 0. Let y 2 be the positive definite square root of the positive definite matrix 
V. 

Standardization of a component Xi is achieved by relocating it at fij and by rescaling it 
by taking yi = so that E{yi) = and Var(yj) = 1. Similarly, standardization of 

y'Var{xi) 

the gx 1 vector X is achieved by a linear transformation on X—fi, namely, Y = V~2 i^X—jj) 
so that E{Y) = O and Cov(y) = I where I is the identity matrix. The Euclidean norm in 

Y is then [Y'Y]^ = [{X - ^yv -\X - up . This scalar quantity {X - i^yV-\X - fi) has 
many interpretations in different disciplines. A measure of distance between X and fi is 
any norm | |X — /i| |. But if we want to accommodate the joint variations in the components 

well as the fact that the variances of the components may be different then 
we consider a generalized distance between X and One such square of the generalized 
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distance is the square of the Euchdian norm in Y or Y'Y = {X — fiyV~^{X — fi). For 
a given constant c > 0, (X — fiyV~^{X — ji) = c defines the surface of an elhpsoid since 
V is positive definite. This elhpsoid is known as the ehipsoid of concentration of X 
around its expected value /x. If we assume that c is fixed, for example c = 1 which implies 
(X— /i) = 1 then this assumption is equivalent to saying that the standardized 
X, namely, y is a point on the surface of a hypersphere of radius 1. When it is assumed 
that the ellipsoid of concentration is a fixed finite quantity what we are saying is that the 
generalized distance of X from /i is fixed and finite. This is the principle of conservation 
of the ellipsoid of concentration. 



3 Generalized entropic form of order a 

Let f{X) be a real-valued scalar function of X where X could be a scalar quantity or a 
g X 1 vector, g > 1, or p x g matrix, p > 1, g > 1. Let us assume that the elements in X are 
real scalar random variables. Then f{X) can define a density provided J^/(X)dX = 1 
and f{X) > for all X. If J^f{X)dX = h < oo then g{X) = is a density 

provided f{X) > for all X. Here dX denotes the wedge product of the differentials in 
X. For example, dX = dxn A dxi2 A ... A dxi, A dx2i A ... A dxpg if X is p x q and all 
elements in X are functionally independent. A measure of uncertainty or information in 
X or in f{X) is measured by Shannon entropy defined by 

S{f) = - [ f{X)\nf{X)dX (3.1) 
Jx 

when / is continuous, where X may be scalar or vector or a general matrix and / is the 

density of X. There are generalizations of S{f), some of them are listed in Mathai and 

Rathie [1]. Some of these are the following (Mathai and Haubold [2]): 

. R HJ^{f{X)rdX] 
Kenyi s entropy KaU) = ; a l,a > y) 

1 — a 

/^[ /(x)ndx -i 

2l-a _ I 



Havrda-Charvat entropy H^if) = — ^ ' en ^ 1, a > 



isalhs non-extensive entropy ia\j) = ? a l,a > u 

1 — a 

, ^ L[/(X)]2""dX-l 

Non-extensive generalized entropic form Maif) = , a f= 1, a < 2 

a — 1 

ln[/x{ /(X)}^~"dX] 

Extensive generalized entropic form M*{f) = , a ^ 1, a < 2. 

a — 1 

Let us look into the problem of optimizing the non-extensive generalized entropic form 
Ma{f) under the principle of the conservation of the ellipsoid of concentration. That is, 
to optimize Ma{f) over all functional /, subject to the conditions 

(i) [ /(X)dX = l; (ii) [ (X-/i)V-^(X-/i)/(X)dX = constant 
Jx Jx 
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for all / > for all X. If we apply calculus of variation technique then the Euler equation 
becomes 

^ - Ai/ + X2iX - ^iyV-\X - /x)/] = 0, a < 2 

where Ai and A2 are Lagrangian multipliers, observing the fact that since a is fixed, 
optimization of ^^3^- is equivalent to optimizing /^~" over all functional /. That is. 



/ 



l-a Ai 



2- a 



l-^(X-/.)V"i(X-/x) 



Either by taking ^ = a(l — a), a > or by taking the second condition as the expected 
value of (1 — a){X — ^)'V~^{X — /i) is 1 where 1 — a denotes the strength of information 
in f{X), see Mathai and Haubold [2], we have 

/ = A[l - a(l - a){X - ijyV-\X - 12)]^ (3.2) 

where A is the normalizing constant, 1 — a(l — a){X — — /i) > 0. Observe that 

when a < 1 the form in (3.2) is that of a multivariate type-1 beta type density. When 
a > 1, writing 1 — a = —{a — 1) we have 

/ = A[l + a{a - 1){X - fiyV~\X - /i)]"^, a > 1, a > 0. (3.3) 

Note that (3.3) is a multivariate type-2 beta type density. But when a — 1 in (3.2) and 

(3.3) we have the form 

/ = Ae--(^-^)'^-'(^-^). (3.4) 

Note that A in (3.2), (3.3) and (3.4) are different, which are to be evaluated separately 
for the three cases of a < l,a > 1 and a — > 1. Thus (3.2) and (3.3) provide a pathway 
to the multivariate Gaussian density in (3.4). When a = \ the normalizing constant in 

(3.4) is 



2 

A = 



(vr)^lV^h 



{2ti)-2\V\^ 2 

or when a — > 1 in (3.2) and (3.3). 



- for a = - (3.5) 
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3.1 The normalizing constant A 

Let us consider the case a < 1 first. Since the total integral is 1 we have 

1 = / f{X)dX 
Jx 

= A|r|5 J [1- a(l - a){yl + ■■■ + yJ)]^dF, Y = V'^X - /i) ^ dX = \V\-^dY. 



= A2^|y|^ / . . . / [1 _ a(l - + ■ ■ ■ + y'^)]—dY. 

J ^%>0,i=l...<7,l-a(l-a)(j/2+...+y2)>o 

Put Uj = a(l — =^ dyj = ^— a < 1. Then 



X\V\^ 



U{ ... Uq 

l — U\ Uq>Ofl<Uj<l,j = l,...,q 



x(l — ui — ■ ■ • — ng)T^dni A ... A duq 

^ m- [r(i)]--r(^ + i) 

[a(l-a)]f r(^ + l + f) 
by evaluating the integral with the help of a type-1 Dirichlet integral (Mathai [3]). Thus 

A = ^ \^ ^ \ for a < 1. (3.6) 

For a > 1, writing 1 — a = — (a — 1) and proceeding as above and then finally evaluating 
the integral with the help of a type-2 Dirichlet integral [Mathai [3]) we have 

, K^-i)]-r(^) 1 

A = — , > U, a > 1. (3.7) 

When a — > 1 do (3.6) and (3.7) go to (3.3)? This can be checked with the help of Stirling's 
formula which states that for \z\ — oo and e a bounded quantity, 



V{z + e)^ V2^z"+"-2e-^ (3.8) 

Note that for a < 1 and when a — > 1, oo. Then applying Stirling's formula to 

r + 1 + I) and r + l) in (3.6) we have 

+ 1 + e-T^[a(l - «)]§ 
A — 1 1 1 

£ 



which is the value of A in (3.5). Then when a approaches 1 from the left, (3.6) goes to 
(3.5). Similarly we can see that (3.7) also goes to (3.5) when a — > 1 from the right. This 
constitutes the pathway to multivariate Gaussian density. 

3.2 The mean value and covariance matrix of X in (3.2) 



E{X) = / Xf{X)dX = fi / f{X)dX+ / iX-i2)f{X)dX 
Jx Jx Jx 

= n + X\V\^V^ j F[l -a(l -a)FT]i^dF}, 

since f{X)dX = 1 and since X - /i = V^Y when Y = V-^X - n) ^ dX = \V\^dY. 

But Y[l — a{l — a)Y'Y]^^ is an odd function and hence the integral over Y is null. Hence 
E{X) = 

Cov(X) = E{X - E{X)){X - E{X)y 
= E{X - fi){X - fi)' 
= V^E(YY')}vKy = V-^{X - fi) 

= X\V\W^j YY'[1- a{l- a)Y'Y]^dY}V^. 

Note that YY' is a g x g matrix where the (z, j)th element is yiUj. For i ^ j the integral 
over Y is zero since yiUjll — a(l — a)Y'Y]'^ is an odd function in i/i as well as in i/j. The 
diagonal elements of YY' are yf,...,y'^. The integral over one of them will be of the form 

JyVHI - a{l - a)Y'Y]^dY for 1 - a(l - a)Y'Y > when a < 1. 
= 2'^ j ... j yl{l - a{l-a)Y'Y]T^dY for yi > 0, j = 1 . . . q, a<l and 

l-a(l-«)(y2 + ... + yJ) >0 

1 f f ^-1 i-i i-i .j_ 

ul ul . . .Uq (1 — ?ii — ■ ■ ■ — Ug) i-^dwi A ... A dUg 



[a(l-a)]t+i 

^ |[rQ)]-r(^ + i) 

[a(l-a)]i+^r(^ + 1 + 1 + 1)' 
by using a type-1 Dirichlet integral. Now, substitute in (3.2) and (3.6) we have 

Cov(X) = — -j^-. -jV = — — — V, a < 1. (3.9) 

^ ^ 2a(l-a)[^ + l + |] 2a[l + (l-«)(l + f)] ^ ^ 

Observe that it is an interesting result because the covariance matrix in X is not the 
parameter matrix V in the model (3.2) and (3.6). For a > 1, proceeding as before, one 
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has 

for a > 1, 1 — (a — 1) (I + l) > which imphes 1 < a < 1 + -j^. Observe that when 

a = I and a ^ 1 then (3.8) and (3.9) give the covariance matrix as V which agrees with 
the multivariate Gaussian density. Hence the pathway for the covariance matrix is given 
in (3.8) and (3.9). 

3.3 The pathway surface 

Let us look into the pathway model for the standard case. That is, for a < 1, 

[a(l-a)]5r(j^ + l + f) , , 1 

r(Tr^ + l)vr2 

1 - a{l - a){yf -\ h y^) > 0. This is plotted for g = 2, a = 1 and for a = -0.5, 0, 0.5. 




For a > 1, 

[a(«-l)]§r(^) , ,1 I q 

This is plotted for g = 2, a = 1, and for a = 1.1, 1.5, 1.7. 
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For a 



1 



9m 




This is plotted for a = 1. 




F3g.3 



The nature of the pathway surface when ot moves from -0.5 to 1 can be seen from Figures 
la-lc and Figure 3. The nature of the movement when a moves from 1 to 1.7 can be seen 
from Figure 3 and Figures 2a-2c. 



The multivariate Gaussian density and its central place in the procedure of maximizing 
a generalized entropic form of order a is the core result of this paper. It contributes to 
gain understanding of different entropic forms and how they relate to each other by using 
the parameter a (Mathai and Rathie [1], Masi [8]). This makes visible the pathway from 
type-1 beta, through Gaussian, to type-2 beta densities as they emerge depending on a 
and shows the relation to entropies of Boltzmann-Gibbs and Tsallis statistical mechanics 
(Hilhorst and Schehr [9], Vignat and Plastino [10]). While the generalized entropic form 
of order a may not have direct applications in statistical mechanics, it might be of in- 
terest to information theory and to a better understanding of attempts to unify entropic 
forms under either mathematical or physical principles. A graphical representation of the 
pathway is given in Figures 1, 2, and 3. 
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